Question Answering Using XML-Tagged Documents

نویسنده

  • Kenneth C. Litkowski
چکیده

The official submission for CL Research's question-answering system (DIMAP-QA) for TREC-11 only slightly extends its semantic relation triple (logical form) technology in which documents are fully parsed and databases built around discourse entities. We were unable to complete the planned revision of our system based on a fuller discourse analysis of the texts. We have since implemented many of these changes and can now report preliminary and encouraging results of basing our system on XML markup of texts with syntactic and semantic attributes and use of XML stylesheet functionality (specifically, XPath expressions) to answer questions. The official confidence-weighted score for the main TREC-11 QA task was 0.049, based on processing 20 of the top 50 documents provided by NIST. Our estimated mean reciprocal rank was 0.128 for the exact answers and 0.227 for sentence answers, comparable to our results from previous years. With our revised XML-based system, using a 20 percent sample of the TREC questions, we have an estimated confidence-weighted score of 0.869 and mean reciprocal rank of 0.828. We describe our system and examine the results from XML tagging in terms of question-answering and other applications such as information extraction, text summarization, novelty studies, and investigation of linguistic phenomena.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Summarization Using XML-Tagged Documents

CL Research’s participation in the Document Understanding Conference extended the framework used in the TREC 2003 question-answering track, in which texts are parsed and processed into XML-tagged documents where sentence elements are marked with discourse, syntactic, and semantic attributes. This extension was made primarily to test the viability of using XML-tagged documents for summarization....

متن کامل

Retrieval Using Structure for Question Answering

This paper examines the use of XML for modern extractionbased question answering (QA). We feel that the XML community has taken too narrow a view of structured retrieval, and that examining specific applications such as QA can give the XML retrieval community a broader view of the problems and challenges that structured retrieval faces. In our examination of QA, we argue that the next steps for...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

Overview of the INEX 2010 Question Answering Track (QA@INEX)

The INEX QA track (QA@INEX) aims to evaluate a complex question-answering task. In such a task, the set of questions is composed of complex questions that can be answered by several sentences or by an aggregation of texts from di erent documents. Question-answering, XML/passage retrieval and automatic summarization are combined in order to get closer to real information needs. Based on the grou...

متن کامل

Cooperative XML ( CoXML ) Query Answering at INEX 03

The Extensible Markup Language (XML) is becoming the most popular format for information representation and data exchange. Much research has been investigated on providing flexible query facilities while aiming at efficient techniques to extract data from XML documents. However, most of them are focused on only the exact matching of query conditions. In this paper, we describe a cooperative XML...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002